1 July 2025

Aside - Extreme value modelling

  • Consider the problem of producing high-resolution risk maps of some climate variable for a very large region

Annual maxima

  • Our favourite (easiest!) extremes to model are annual maxima

Extreme value distribution fits

100-year return level

Aims

  • For ADD-TREES we need hi-res climate data to drive JULES
    • we’re aiming for parcel level
    • no daily climate data are available at such high res, so we need to downscale some lower res data
  • mesoclim will downscale UKCP18’s 12km data to parcel level
    • this downscaled data can drive JULES
  • But, UKCP18’s 12km data is only available for RCP8.5, which gives a rather incomplete picture of future climate

Method - Temperatures I

  • Consider a single UKCP18 ensemble member
  • Let \(Y_{t, i}\) denote the temperature on day \(t\) for grid cell \(i\)
  • The collection of temperatures for a day is \(\mathbf{Y}_{t} = (Y_{t, 1}, \ldots, Y_{t, n})^{\textsf{T}}\), and for UKCP18 we have \(n = 9184\) in its original form, which covers the UK and Ireland
  • A relatively flexible statistical model for \(\mathbf{Y}_{t}\) is a multivariate Gaussian distribution, i.e. \[ \mathbf{Y}_{t} \sim MVN_n(\boldsymbol{\mu}, \boldsymbol{\Sigma}) \] and we can easily estimate \(\boldsymbol{\mu}\) and \(\boldsymbol{\Sigma}\) as \[ \hat{\boldsymbol{\mu}} = \dfrac{1}{T} \sum_{t = 1}^T \mathbf{y}_t,~~ \hat{\boldsymbol{\Sigma}} = \dfrac{1}{T - 1} \sum_{t = 1}^T (\mathbf{y}_t - \hat{\boldsymbol{\mu}}) (\mathbf{y}_t - \hat{\boldsymbol{\mu}})^{\textsf{T}} \]

UKCP18 and simulated temperatures

Method - Temperatures II

  • For temperatures, it’s important to capture day-to-day variability
    • assuming a day’s temperatures are independent of yesterday’s isn’t realistic
  • We can extend the multivariate Gaussian model to be \(2n\)-dimensional \[ \begin{bmatrix} \mathbf{Y}_{t} \\ \mathbf{Y}_{t+1} \end{bmatrix} \sim MVN_n\left( \begin{bmatrix} \boldsymbol{\mu} \\ \boldsymbol{\mu} \end{bmatrix},\, \begin{bmatrix} \boldsymbol{\Sigma} & \boldsymbol{\Psi}\\ \boldsymbol{\Psi}^{\textsf{T}} & \boldsymbol{\Sigma} \end{bmatrix} \right) \] and can easily estimate \(\boldsymbol{\Psi}\), which we can use to simulate \[ \mathbf{Y}_{t+1} \mid \mathbf{Y}_{t} = \mathbf{z} \sim MVN_n\left( \tilde{\boldsymbol{\mu}}(\mathbf{z}), \tilde{\boldsymbol{\Sigma}}\right) \]

UKCP18 and simulated temperatures again

Multivariate Gaussian other-scenario simulations

  • The average over multiple grid cells in a region \(R_j\) can be written as \[ \bar Y_t(R_j) = \dfrac{1}{n_{R_j}} \sum_{s \in R_j} Y_{t, i} \] where \(n_{R_j}\) is the number of cells in region \(R_j\), which can also be written as \[ \bar Y_t(R_j) = \mathbf{r}_j^{\textsf{T}} \mathbf{Y}_t \] for an appropriately formed vector \(\mathbf{r}_j\).

Multivariate Gaussian other-scenario simulations continued

  • Because of the MVN assumption, we know the distribution of \(\bar Y_t(R_j)\) \[ \bar Y_t(R_j) = \mathbf{r}_j^{\textsf{T}} \mathbf{Y}_t \sim N(\mathbf{r}_j^{\textsf{T}} \boldsymbol{\mu}, \mathbf{r}_j^{\textsf{T}} \boldsymbol{\Sigma} \mathbf{r}_j) \]
  • And we also know that \[ \mathbf{Y}_t \mid \bar Y_t(R_j) = z \sim MVN_n\left(\boldsymbol{\mu}^* , \boldsymbol{\Sigma}^*\right) \] with closed-form expressions for \(\boldsymbol{\mu}^*\) and \(\boldsymbol{\Sigma}^*\).
  • We can extend this to \(p\) constraints with \(p \times n\) matrix \(\mathbf{A}\) so that \[ \mathbf{Y}_t \mid \mathbf{A} \mathbf{Y}_t = \mathbf{z} \sim MVN_p(\ldots, \ldots) \]

Some other-scenario temperature data

Projections at Exeter

Wind components \(u\) and \(v\)

  • A Gaussian model seems okay for \(u\) and \(v\)
  • But we if we treat them as independent, resulting wind speeds will be nonsensical

Wind components \(u\) and \(v\)

  • Simulated \(u\) and \(v\) pairs

Wind components \(u\) and \(v\)

  • When conditioning on lo-res \(u\) and \(v\) pairs, it’s not clear whether to average \(u\)s and \(v\)s
    • it might be more sensible to average the wind speed
    • but this has to be approximated numerically

Non-Gaussian data

  • Now suppose that \(Y_{t, i} \sim F_{i}\)
    • marginal cumulative distribution function (CDF) \(F_{i}\) has empirical estimate \(\hat F_{i}\)
    • and corresponding inverse \(\hat F_{i}^{-1}\)
  • Then \(\Phi^{-1}(F_{ij}(Y_{t, i})) \sim N(0, 1)\), where \(\Phi\) denotes the standard Gaussian CDF, - We proceed modelling \[ \hat Z_{t, i} = \Phi^{-1}(\hat F_{i}(Y_{t, i})) \sim MVN_n(\boldsymbol{\mu}, \boldsymbol{\Sigma}) \]
  • Then we can simulate \(Z_{t, i}\) from \(MVN_n(\hat{\boldsymbol{\mu}}, \hat{\boldsymbol{\Sigma}})\) and then obtain \(\hat Y_{t, i} = F_{i}^{-1}(\Phi(Z_{t, i}))\).

Non-Gaussian data

  • Non-Gaussian margins remove the property that \(\bar Y_t\) is also Gaussian
  • However, \(\bar Z_t = n^{-1} \mathbf{1}_n^\text{T} \boldsymbol{Z}_t\) is Gaussian
  • We adopt the approach of establishing an empirical relationship between \(\bar Z_t\) and \(\bar Y_t\) by assuming that \(\hat{\bar Z}_t = g(\hat{\bar Y}_t) + \epsilon_i\)

Non-Gaussian data

  • Let’s consider cloud cover percentage
  • Here we see \(\hat{\bar z}_t\) plotted against \(\hat{\bar y}_t\), together with \(\hat g()\), a cubic spline estimate of \(g()\)

Non-Gaussian data

  • Let’s consider cloud cover percentage
  • Here we see \(\hat{\bar z}_t\) plotted against \(\hat{\bar y}_t\), together with \(\hat g()\), a cubic spline estimate of \(g()\)

Non-Gaussian data

  • Let’s consider cloud cover percentage
  • Here we see \(\hat{\bar z}_t\) plotted against \(\hat{\bar y}_t\), together with \(\hat g()\), a cubic spline estimate of \(g()\)

Non-Gaussian data

  • Gaussian scale CLT simulations given total percentages of 10, 30, 50, 70, 90 and 95%.

Non-Gaussian data

  • Original scale CLT simulations given total percentages of 10, 30, 50, 70, 90 and 95%.

Even more variables

  • We don’t just need to model \(u\) and \(v\) as dependent
    • we should assume all the variables going into mesoclim and then JULES aren’t independent

Seasonal variation

  • Variables change from month-to-month
    • not just means, but also covariances

Summary

  • Generating UKCP18-like data for other SSPs requires a few considerations
  • Dependencies between data must be preserved
    • if variables are treated as independent, this will have consequences for downscaling and subsequent estimates of tree growth
  • Gaussian models are mathematically convenient, but transformations are needed for them to sensibly represent some variables, such as cloud cover
  • Conditioning on low-res other-SSP data, seems to give sensible projections into the future
  • SSTs still need to be considered
    • and probably need to be consistent with the UKCP18 variables